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Abstract 

This  paper  highlights  several  areas  where  graphical  techniques  can  be  harnessed 
to  address  the  problem  of  measurement  errors  in  causal  inference.  In  particulars, 
the  paper  discusses  the  control  of  partially  observable  confounders  in  parametric  and 
non  parametric  models  and  the  computational  problem  of  obtaining  bias-free  effect 
estimates  in  such  models. 


1  Introduction 

This  paper  discusses  methods  of  dealing  with  measurement  errors  in  the  context  of 
graph-based  causal  inference.  My  motivation  for  tackling  this  problem  was  sparked  by  a 
remarkable  result  that  I  discovered  a  few  months  ago  in  (Greenland  and  Lash,  2008),^ 
which  I  believe  should  open  new  vistas  of  possibilities  for  graphical  modelers. 

Consider  the  problem  of  estimating  the  causal  effect  of  X  on  F  when  a  sufficient  set  Z 
of  confounders  can  only  be  measured  with  error  (see  Fig.  1),  via  a  proxy  set  W.  Since  Z 
is  assumed  sufficient,  the  causal  effect  is  identihed  from  measurement  on  X,  V,  and  Z,  and 
can  be  written 

F(yjdo(x))  =  P(yjx,  z)P(z)  (1) 

Z 

However,  if  Z  is  unobserved,  and  W  is  but  a  noisy  measurement  of  Z,  d-separation  tells 
us  immediately  that  adjusting  for  W  is  inadequate,  for  it  leaves  the  back-door  path(s) 

X  ^  Z  ^  Y  unblocked.^  Therefore,  regardless  of  sample  size,  the  effect  of  X  on  H  cannot 
be  estimated  without  bias.  It  turns  out,  however,  that  if  we  are  given  the  conditional 
probabilities  P{w\z)  that  govern  the  error  mechanism  we  can  perform  a  modihed- adjustment 
for  W  that,  in  the  limit  of  very  large  sample,  would  amount  to  the  same  thing  as  observing 
and  adjusting  for  Z  itself,  thus  rendering  the  causal  effect  identifiable. 

^Earlier  works  include  Greenland  and  Kleinbaum  (1983);  Selen  (1986);  and  Greenland  (1988). 

^For  concise  definitions  and  descriptions  of  graphical  concepts  such  as  “d-separation”  and  “back-door” 
see  Pearl,  2009,  pp.  335-6,  344-5. 
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Figure  1;  Needed  the  causal  effect  of  X  on  F  when  Z  is  unobserved,  and  W  provides  a  noisy 
measnrement  of  Z. 

The  possibility  of  removing  bias  by  modihed  adjustment  came  as  a  snrprise  to  me, 
because,  although  F(wlz)  is  assumed  given,  the  actnal  valne  of  the  confonnder  Z  remains 
nncertain  for  each  measurement  W  =  w,  so  one  would  expect  to  get  either  a  distribution 
over  causal  effects,  or  bounds  thereof.  Not  so;  we  actnally  get  a  repaired  point  estimate  of 
P(yldo(x))  that  is  asymptotically  nnbiased. 

This  remarkable  result,  which  I  will  label  “effect  restoration,”  has  powerful  consequences 
in  practice  because,  when  P{w\z)  is  not  given,  one  can  resort  to  a  Bayesian  (or  bonnding) 
analysis  and  assnme  a  prior  distribution  (or  bonnds)  on  the  parameters  of  P{w\z)  which 
would  yield  a  distribution  (or  bounds)  over  P{ii\do{x))  (Greenland,  2007).  Alternatively,  if 
costs  permit,  one  can  estimate  P{w\z)  by  re-testing  Z  in  a  sampled  subpopulation. ^ 

On  the  surface,  the  possibility  of  correcting  for  measurement  bias  seems  to  undermine 
the  importance  of  accnrate  measnrements.  It  snggests  that  as  long  as  we  know  how  bad  onr 
measnrements  are  there  is  no  need  to  correct  them  because  they  can  be  corrected  post-hoc 
by  analytical  means.  This  is  not  so.  First,  althongh  an  unbiased  effect  estimate  can  be 
recovered  from  noisy  measnrements,  sampling  variability  increases  substantially  with  error. 
Second,  even  assuming  unbounded  sample  size,  the  estimate  will  be  biased  if  the  postnlated 
P{w\z)  is  incorrect.^ 

Effect  restoration  can  be  analyzed  from  either  a  statistical  or  causal  viewpoint.  Taking 
the  statistical  view,  one  may  argue  that,  once  the  effect  P{ii\do{x))  is  identihed  in  terms  of 
a  latent  variable  Z  and  given  the  estimand  in  (1),  the  problem  is  no  longer  one  of  causal 
inference,  bnt  rather  of  regression  analysis,  whereby  the  regressional  expression  EzP{ii\x,  z) 
need  to  be  estimated  from  a  noisy  measnrement  of  Z,  given  by  W .  This  is  indeed  the 
approach  taken  in  the  vast  literature  on  measurement  error  (e.g.,  (Selen,  1986;  Carroll 
et  ah,  2006)). 

The  causal  analytic  perspective  is  different;  it  maintains  that  the  nltimate  pnrpose  of 

^In  the  literature  on  measurement  errors  and  sensitivity  analysis,  this  sort  of  exercise  is  normally  done  by 
re-calibration  techniques  (Greenland  and  Lash,  2008).  The  latter  employs  a  “validation  study”  in  which  Z 
is  measured  without  error  in  a  subpopulation  and  used  to  calibrate  the  estimates  in  the  main  study  (Selen, 
1986). 

^In  extreme  cases,  wrongly  postulated  P{w\z)  may  conflict  with  the  data,  and  no  estimate  will  be  obtained. 
For  example,  if  we  postulate  a  non  informative  W ,  P{w\z)  =  P{w),  and  we  find  that  W  strongly  depends 
on  X,  a  contradiction  arises  and  no  effect  estimate  will  emerge. 
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the  analysis  is  not  the  statistics  of  X,Y,  and  Z,  as  is  normally  assumed  in  the  measurement 
literature,  but  a  causal  quantity  P{y\do{x))  that  is  mapped  into  regression  vocabulary 
only  when  certain  causal  assumptions  are  deemed  plausible.  Moreover,  the  very  idea  of 
modeling  the  error  mechanism  P{w\z)  requires  causal  considerations;  errors  caused  by  noisy 
measurements  are  fundamentally  different  from  those  caused  by  noisy  agitators.  Indeed, 
the  reason  we  seek  an  estimate  P{w\z)  as  opposed  to  P{z\w),  be  it  from  judgment  or 
from  pilot  studies,  is  that  we  consider  the  former  to  be  a  more  reliable  and  transportable 
parameter  than  the  latter.  Transportability  is  a  causal  notion  that  is  hardly  touched  upon 
in  the  statistical  measurement  literature. 

Viewed  from  this  perspective,  the  measurement  error  literature  appears  to  be  engaged 
(unwittingly)  in  a  causal  inference  exercise  that  can  beneht  substantially  from  making 
the  causal  framework  explicit.  The  beneht  can  in  fact  be  mutual;  identihability  with 
partially  specihed  causal  parameters  (as  in  Fig.  1)  is  rarely  discussed  in  the  causal  inference 
literature  (notable  exceptions  are  (Hernan  and  Cole,  2009)  and  (Cai  and  Kuroki,  2008)), 
while  graphical  models  are  hardly  used  in  the  measurement  error  literature. 

In  this  paper  we  will  consider  the  mathematical  aspects  of  effect  restoration  and  will 
focus  on  asymptotic  analysis.  Our  aims  are  to  understand  the  conditions  under  which  effect 
restoration  is  feasible,  to  assess  the  computational  problems  it  presents,  and  to  identify 
those  features  of  P{w\z)  and  P{x,y,w)  that  are  major  contributors  to  measurement  bias, 
and  those  that  contribute  to  robustness  against  bias. 

2  Effect  Restoration  by  Matrix  Adjustment 

The  main  idea,  adapted  from  (Greenland  and  Lash,  2008,  p.  360),  is  as  follows:  Starting 
with  the  joint  probability  P{x,y,  z,w),  and  assuming  that  W  depends  only  on  Z,^  i.e., 

P{w\x,y,  z)  =  P{w\z)  (2) 

we  write 

P{x,  y,w)  =  Y^  P{x,  y,  z,  w) 

Z 

=  ^P{w\x,y,z)P{x,y,z) 

Z 

=  ^P{w\z)P{x,y,z) 


For  each  x  and  y,  we  can  interpret  the  transformation  above  as  a  vector-matrix 
multiplication: 


V{w)  =  M {w ,  z)V {z) 

Z 


^This  assumption  goes  under  a  rather  strange  rubric:  “non-differential  error”  (Carroll  et  al.,  2006). 
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where  V{w)  =  P{x,y,w)  and  M{w,z)  is  a  stochastic  matrix  (i.e.,  the  entries  in  each  row 
are  non-negative  and  sum  to  one).  It  is  well  known  that,  under  fairly  broad  conditions,  M 
has  an  inverse  (call  it  I),  which  allows  us  to  write: 

y,z)  =  ^I  {z,  w)P{x,  y,  w)  (3) 

W 

We  are  done  now,  because  (3)  enables  us  to  reconstruct  the  joint  distribution  of  X,  Y, 
and  Z  from  that  of  the  observed  variables,  X,  Y,  and  W.  Thus,  each  term  on  the  right 
hand  side  of  (1)  can  be  obtained  from  P{x,y,w)  through  (3)  and,  assuming  Z  is  a  sufficient 
set  (i.e.,  satisfying  the  back-door  test),  P{y\do{x))  is  estimable  from  the  available  data. 
Explicitly,  we  have: 

P{y\do{x))  =  Y.zPhj^z,x)P{z)lP{x,z) 

=  E. 

=  E. 

Note  that  the  same  inverse  matrix,  /,  appears  in  all  summations.  This  will  not  be  the 
case  when  we  do  not  assume  independent  noise  mechanisms.  In  other  words,  if  (2)  does  not 
hold,  we  must  write: 


w)P{x,  y,  w) 
EwHz,'w)P{x,y,w) 


(4) 


P{x,  y^w)=^  P{w\x,  y,  z)P{x,  y,  z) 

Z 

=  ^M,,y{w,z)P{x,y,z) 

Z 

where  M^y  and  its  inverse  Ixy  are  both  indexed  by  the  specihc  values  of  x  and  y,  and  we 
then  obtain: 

y,z)  =  ^  Ixy{z,  w)P{x,  y,  w)  (5) 

W 

which,  again,  permits  the  identihcation  of  the  causal  effect  via  (5)  except  that  the  expression 
becomes  somewhat  more  complicated.  It  is  also  clear  that  errors  in  the  measurement  of  X 
and  Y  can  be  absorbed  into  a  vector  W,  and  do  not  present  any  conceptual  problem. 

Equation  (4)  demonstrates  the  feasibility  of  effect  reconstruction  and  proves  that, 
despite  the  uncertainty  in  the  variables  X,  Y  and  Z,  the  causal  effect  is  identihable  once  we 
know  the  statistics  of  the  error  mechanism. 

This  result  is  reassuring,  but  presents  practical  challenges  of  both  representation, 
computation  and  estimation.  Given  the  potentially  high  dimensionality  of  Z  and  W, 
the  parameterization  of  I  would  in  general  be  impractical  or  prohibitive.  However,  if  we 
can  assume  independent  local  mechanisms,  P{w\z)  can  be  decomposed  into  a  product 
P{w\z)  =  P{wi\zi)P{w2\z2), . . . ,  P{wk\zk)  which  renders  /  decomposable  as  well.  Even 
when  full  decomposition  is  not  plausible,  sparse  couplings  between  the  different  noise 
mechanisms  would  enable  parsimonious  parameterization  using,  for  example,  Bayesian 
networks. 

The  second  challenge  concerns  the  summations  in  Eq.  (4)  which,  taken  literally,  calls 
for  exponentially  long  summation  over  all  values  of  w.  In  practice,  however,  this  can  be 
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mitigated  since,  for  any  given  there  will  be  only  small  number  of  w’s  for  which  I{z,w)  is 
non- negligible.  This  computation,  again,  can  be  performed  efficiently  using  Bayes  networks 
inference. 

This  still  would  not  permit  us  to  deal  with  the  problem  of  empty  cells  which,  owed 
to  the  high  dimensionality  of  Z  and  W  would  prevent  us  from  getting  reliable  statistics 
of  P{x,y,w),  as  required  by  (4).  One  should  resort  therefore  to  propensity  score  {PS) 
methods,  which  map  the  cells  of  Z  onto  a  single  scalar. 

The  error-free  propensity  score  L{z)  =  P{X  =  IjZ  =  z)  being  a  functional  of  P{x,y,z) 
can  of  course  be  estimated  consistently  from  samples  of  P{x,  y,  w)  using  the  transformation 
(3).  Explicitly,  we  have: 

L{z)  =  P{X  =  1|Z  =  z) 

=  P{X  =  l,Z  =  z)lP{z) 

=  J2Pix  =  i,y,z)/J2Pipy,z) 

y  xy 

where  P{x,y,z)  is  given  in  (4). 

Using  the  decomposition  in  (2),  we  can  further  write: 

='^P{^  =  '^P{x,y,z) 

y  xy 

=  ^I{z,  w)P{X  =  1,  w)/  ^  I {z,  w)P{w)  (6) 

W  W 

=  ^I{z,  w)L{w)P{w)/ ^  I {z,  w)P{w) 

W  W 

where  L{w)  is  the  error-prone  propensity  score 

L{w)  =  P{X  =  l|fU  =  w). 

We  see  that  L{z)  can  be  computed  from  I{z,w),  L{w)  and  P{w).  Thus,  if  we  succeed 
in  estimating  these  three  quantities  in  a  parsimonious  parametric  form,  the  computation 
of  L{z)  would  be  hindered  only  by  the  summations  called  for  in  (5).  Once  we  estimate 
L{w)  parametrically  for  each  conceivable  tc,  Eq.  (9)  permits  us  to  assign  to  each  tuple  2:  a 
bias-less  score  L{z)  that  correctly  represents  the  probability  of  X  =  1  given  Z  =  z.  This,  in 
turn,  should  permit  us  to  estimate,  for  each  stratum  L  =  I,  the  probability 

F(0=  5;  P(z) 

z\L{z)=l 

then  compute  the  causal  effect  using 

P{y\do{x))  =  J2Piy\pl)Pil)- 

i 

One  technique  for  approximating  P{1)  was  proposed  by  Stiirmer  et  ah  (2005),  which  did 
not  make  full  use  of  the  inversion  in  (9)  or  of  graphical  methods  facilitating  this  inversion. 
A  more  promising  approach  would  be  to  construct  P{1)  and  P{y\x,  1)  directly  from  synthetic 
samples  of  P{x,  y,  z)  that  can  be  created  to  mirror  the  empirical  samples  of  P{x,  y,  w).  This 
is  illustrated  in  the  next  subsection,  using  binary  variables. 
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3  Effect  Restoration  in  Binary  Models 

Let  X,  y,  Z  and  W  be  binary  variables,  and  let  the  the  noise  mechanism  be  characterizes  by 


F(W  =  0|Z  =  1)  =  e 
P(W  =  1|Z  =  0)  =  5 

To  simplify  notation,  let  the  propositions  Z  =  1  and  Z  =  0  he  denoted  by  zi  and  zq, 
respectively,  and  the  same  for  W  =  1  and  IT  =  0,  so  that  e  and  6  can  be  written 

e  =  P{wo\zi) 

5  =  P{wi\zo) 

Equation  (3)  then  translates  to 

y,  Zq)  =  [(1  -  e)P{x,  y,  Wo)  -  eP{x,  y,  wi)]/(l  -  e  -  5) 

P{x,  y,  zi)  =  [-SP{x,  y,  wq)  +  (1  -  5)P{x,  y,  wi)]/(l  -  e  -  5)  (7) 

which  represents  the  inverse  matrix 


I{w,z) 


1  —  (5  e  ^  _  1  1  —  e— e 

S  1  -  e  I  -  (T  -  §  -S  1-5 


Metaphorically,  the  transformation  in  (7)  can  be  described  as  a  mass  re-assignment 
process,  as  if  every  two  cells,  {x,y,Wo)  and  {x,y,wi),  compete  on  how  to  split  their 
combined  weight  P{x,  y)  between  the  two  latent  cells  (x,  y,  Zq)  and  {x,  y,  zi)  thus  creating  a 
synthetic  population  P{x,y,z)  from  which  (4)  follows.  Figure  2  describes  how  P{wi\x,y), 
the  fraction  of  the  weight  held  by  the  {x,y,Wi)  cell  determines  the  fraction  P{zi\x,y) 
that  is  eventually  received  by  cell  {x,y,zi).  The  complementary  fraction,  1  —  P{zi\x,y)  is 
received,  of  course,  by  the  twin  cell  {x,y,Zo),  as  shown  in  Fig.  2. 

Clearly,  when  e  +  5  =  1,  W  provides  no  information  about  Z  and  the  inverse  does  not 
exist.  Likewise,  whenever  any  of  the  synthetic  probabilities  P{x,y,z)  falls  outside  the 
(0, 1)  interval,  a  modeling  constraint  is  violated  (see  Pearl  (1988,  Chapter  8))  meaning 
that  the  observed  distribution  P{x,y,w)  and  the  postulated  error  mechanism  P{w\z)  are 
incompatible  with  the  structure  of  Fig.  1  (see  footnote  4).  If  we  assign  reasonable  priors 
to  e  and  5,  the  linear  function  in  Fig.  2  will  become  an  ^'-shaped  curve  over  the  entire 
[0, 1]  interval,  and  each  sample  {x,  y,  w)  can  then  be  used  to  update  the  relative  weight 
P{x,y,Zi)/P{x,y,zo). 

To  compute  the  causal  effect  P{y\do{x))  we  need  only  substitute  P{x,y,z)  in  Eq.  (1), 
which  gives 

p(  Id  r  ii  =  ~  P{wi\x,y)  IT  I  -P(3^,  |/,Wo)  P{w(j\x,y)\  P(-»)o)] 

~  P{x\wi)  1  -  5P{x)/P{wi)  P{x\wo)  1  -  eP{x)/P{wo)  ' 

(8) 
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P{zi\  x,y) 


Figure  2:  A  curve  describing  how  the  weight  P{x,y)  is  distributed  to  cells  {x,y,zi)  and 
{x,y,ZQ),  as  a  function  of  P{wi\x,y). 


This  expression  highlights  the  difference  between  the  standard  and  modihed  adjustment 
for  W]  the  former  (Eq.  (1)),  which  is  valid  if  W  =  Z,  is  given  by  the  standard  inverse 
probability  weighting  (e.g.,  Pearl,  2009,  Eq.  (3.11)): 


P{y\do{x)) 


P{x,y,wi)  P{x,y,wo) 
P{x\wi)  P{x\wo) 


The  extra  factors  in  Eq.  (8)  can  be  viewed  as  modifiers  of  the  inverse  probability  weight 
needed  for  a  bias-free  estimate.  Alternatively,  these  terms  can  be  used  to  assess,  given  e  and 
S,  what  bias  would  be  introduced  if  we  ignore  errors  altogether  and  treat  hF  as  a  faithful 
representation  of  Z. 

The  inhnitesimal  approximation  of  (8),  in  the  limit  e  — 0,  5  — 0,  reads: 


P{y\do{x)) 


rsj 


+ 


P{x,y,wi) 

1-6 

(  1 

1-P{x)y 

P{x\wi) 

VP(wi|a;,|/) 

PM  Jl 

P{x,y,Wo) 

^  1 

1  -  p(x)y 

P{x\wo) 

yP{wo\x,y) 

PM  Ji 

We  see  that,  even  with  two  error  parameters  (e  and  5),  and  eight  cells,  the  expression  for 
P{y\do{x)  does  not  simplify  to  provide  an  intuitive  understanding  of  the  effect  of  e  and  6 
on  the  estimand.  Such  evaluation  will  be  facilitated  in  the  next  example. 


4  Effect  Restoration  in  Linear  Models 

Figure  3  depicts  a  linear  version  of  the  structural  equation  model  (SEM)  shown  in  Fig.  1. 
Here,  the  task  is  to  estimate  the  effect  coefficient  Cq,  while  the  parameters  C3  and  uar(e^), 
representing  the  noise  mechanism  W  =  C3Z  +  ew,  are  assumed  given. 


7 


(a)  (b) 

Figure  3:  (a)  A  linear  version  of  the  model  in  Fig.  1.  (b)  A  linear  model  with  two  indicators 
for  Z,  permitting  the  identihcation  of  Cq. 


Linear  models  offer  two  advantageous  in  handling  measurement  errors.  First,  they 
provide  a  more  transparent  picture  into  the  role  of  each  factor  in  the  model.  Second,  certain 
aspects  of  the  error  mechanism  can  often  be  identihed  without  resorting  to  external  studies. 
This  occurs,  for  example,  when  Z  possesses  two  independent  indicators,  say  W  and  V  (as 
in  Fig.  3(b)),  in  which  case  the  product  c\var{Z)  is  identihable  and  is  given  by: 


c\var{Z) 


cov{XW)cov{XV) 

coviWV) 


(9) 


As  we  shall  see  below,  this  product  is  sufficient  for  identifying  cq. 

Equation  (9)  follows  from  Wright’s  rules  of  path  analysis  and  reflects  the  well  known  fact 
(e.g.,  (Bollen,  1989,  p.  224))  that,  in  linear  models,  structural  parameters  are  identihable 
(up  to  a  constant  var{Z))  whenever  each  latent  variable  (in  our  case  Z)  has  three 
independent  proxies  (in  our  case  X,  W,  and  E)® 

Cai  and  Kuroki  (2008)  further  showed  that  cq  is  identihable  from  measurements  of 
three  proxies  (of  Z),  even  when  these  proxies  are  dependent  of  each  other.  For  example, 
connecting  W  to  X  and  V  to  Y ,  still  permits  the  identihcation  of  cq.  Similarly,  the  reader 
can  verify  that  adding  an  arrow  from  IF  to  E  in  Fig.  3(b)  does  not  hinder  the  identihcation 
of  Cq. 

To  hnd  Cq  in  the  model  of  Fig.  3,  we  write  the  three  structural  equations  in  the  model 


Y  =  C2Z  +  cqX  +  ey 
W  =  C3Z  +  ew 
X  =  ciZ  +  ex 


and  express  the  structural  parameters  in  terms  of  the  variances  and  covariances  of  the 

®This  partial  identifiability  of  the  so  called  “factor  loadings,”  is  not  an  impediment  for  the  identihcation 
of  Cq.  However,  if  we  were  in  possession  of  only  one  proxy  (as  in  Fig.  3(a))  then  knowledge  of  C3  alone  would 
be  insufficient,  the  product  c'Qvar{Z)  is  required. 
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observed  variables.  This  gives  (after  some  algebra): 


cov{XY)  —  cov{XW)cov{yVY)  /  c\var{Z) 
l-cov^{XW)/clvar{Z) 


(10) 


and  shows  that  the  pivotal  quantity  needed  for  the  identification  of  Cq  is  the  product 


clvar{Z)  =  cl[var(W)  —  var\ew)] 


(11) 


If  we  are  in  possession  of  several  proxies  for  Z,  (^var{Z)  can  be  estimated  from  the  data, 
as  in  Eq.  (9),  yielding: 


cov{XY)cov{XV)  -  cov{YW)cov{WV)) 
cov{XV)var{X)  -  cov{XW)cov{WV) 


(12) 


If  however  Z  has  only  one  proxy,  W,  as  in  Fig.  3(a),  the  product  c|nar(Z)  must  be 
estimated  externally,  using  either  a  pilot  study  or  judgmental  assessment. 

The  decomposition  on  the  right  hand  side  of  Eq.  (11)  renders  the  judgmental  assessment 
of  that  product  cognitively  meaningful,  since  both  C3  and  ew  are  causal  parameters  of  the 
error  mechanism 

W  =  C3Z  +  ew, 

C3  =  E(W\z)/z  measures  the  slope  with  which  the  average  of  W  tracks  the  value  of  Z, 
while  var{ew)  measures  the  dispersion  of  W  around  that  average.  var(W)  can,  of  course 
be  estimated  from  the  data. 

Under  a  Gaussian  distribution  assumption,  C3  and  var{ew)  fully  characterize  the 
conditional  density  f{w\z)  which,  according  to  Section  2,  is  sufficient  for  restoring  the  joint 
distribution  of  and  ;2,  and  thus  secure  the  identihcation  of  the  causal  effect,  through 
(1).  This  explains  why  the  estimation  of  C3  alone,  be  it  from  experimental  data  or  our 
understanding  of  the  physics  behind  the  error  process,  is  not  sufficient  for  neutralizing  the 
confounder  Z.  It  also  explains  why  the  technique  of  “latent  factor”  analysis  (Bollen,  1989) 
is  sufficient  for  identifying  causal  effects,  even  though  it  fails  to  identify  the  “factor  loading” 
C3  separately  of  var{Z). 

In  the  noiseless  case,  i.e.,  var{ew)  =  0,  we  have  var{Z)  =  var{W)/cl  and  Eq.  (11) 
reduces  to: 


Co  — 


cov{XY)  -  cov{XW)cov{WY)/var{W)  _  f3y^  -  (3y^(3u 


1  -  cov‘^{XW)/var{W) 


1-A 


2 

xw 


=  A 


yx-w 


(13) 


where  (3yx.w  is  the  coefficient  of  x  in  the  regression  of  U  on  X  and  hU,  or: 


Pyx-w  =  d/dxE{Y\x,w) 


As  expected,  the  equality  cq  =  (3yx.z  =  f^yx-w  assures  a  bias-free  estimate  of  cq  through 
adjustment  for  W,  instead  of  Z;  C3  plays  no  role  in  this  adjustment. 

In  the  error-prone  case,  cq  can  be  written 


Co 


Pyx  PywPwx  /  k 

1  -  {Pxu,/kf 
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where 


k  =  1  —  var{ew)lvar{W) 

and,  as  the  formula  reveals,  cq  cannot  be  interpreted  in  terms  of  an  adjustment  for  a 
surrogate  variable  V{W). 

The  strategy  of  adjusting  for  a  surrogate  variables  has  served  as  an  organizing  principle 
for  many  studies  in  traditional  measurement  error  analysis  (Carroll  et  ah,  2006).  For 
example,  if  one  seeks  to  estimate  the  coefficient  Ci  =  E{X\z)/z  through  a  proxy  W  of  Z, 
one  can  always  choose  to  regress  X  on  another  variable,  V,  such  that  the  slope  of  X  on  V, 
E{X\v)/v,  would  yield  an  unbiased  estimate  of  Ci.  In  our  example  of  Fig.  3,  one  should 
choose  V  to  be  the  best  linear  estimate  of  Z,  given  W,  namely  V  =  aW,  where 

a  =  Cov{ZW)lvar{yV)  =  C‘ivar{Z)/var{W) 

is  to  be  estimated  separately,  from  a  pilot  study.  However,  this  Two  Stage  Least  Square 
strategy  is  not  applicable  in  adjusting  for  latent  confounders;  i.e.,  there  is  no  variable  V{W) 
such  that  Co  =  (3yx.v 

5  Model  Testing  with  Measurement  Error 

When  variables  are  measured  without  error,  a  structural  equation  model  can  be  tested 
and  diagnosed  systematically  by  examining  how  well  the  data  agrees  with  each  statistical 
constraint  that  the  model  imposes  on  the  joint  distribution  (or  covariance  matrix).  The 
most  common  type  of  these  constraints  are  conditional  independence  relations  (or  zero 
partial  correlations),  and  these  can  be  read  off  the  causal  diagram  through  the  d-separation 
criterion  (Pearl,  2009,  pp.  335-7).  For  each  missing  edge  in  the  diagram,  say  between 
X  and  y,  the  model  dictates  the  conditional  independence  of  X  and  Y  given  a  set  Z  of 
variables  that  d-separates  X  from  Y  in  the  diagram;  these  independencies  can  then  be 
tested  individually  and  systematically  to  assure  compatibility  between  model  and  data 
before  parameter  identihcation  commences. 

When  Z  suffers  from  measurement  errors  (as  in  Fig.  1)  those  conditional  independencies 
are  not  testable,  since  the  proxies  of  Z  no  longer  d-separate  X  from  Y .  The  question  arises 
whether  surrogate  tests  exist  through  the  available  proxies,  to  detect  possible  violations  of 
the  missing-edge  postulate.  The  preceding  section  suggests  such  tests,  provided  we  know 
(or  can  estimate)  the  parameters  of  the  error  process. 

This  is  seen  by  substituting  cq  =  0  in  Eq.  (10),  and  accepting  the  vanishing  of  the 
numerator  as  a  surrogate  test  for  d-separation  between  X  and  Y : 

Theorem  1  If  a  latent  variable  Z  d-separates  two  measured  variables,  X  and  Y ,  and  Z  has 
a  proxy  W,  W  =  cZ  +  €w,  then  cov{XY)  must  satisfy: 

cov{XY)  =  cov{XW)cov{WY)/c^var{Z) 

=  cov{XW)cov{WY)/[var{W)  -  var{ew)]  (14) 
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We  see  that  the  usual  condition  of  vanishing  partial  regression  coefficient  is  replaced  by 
a  modihed  condition,  in  which  var{Z)  needs  to  be  estimated  separately  (as  in  Fig.  3(b)). 
If  the  product  c^var{Z)  is  estimated  from  other  proxies  of  Z,  as  in  Fig.  3(b),  Eq.  (14) 
assumes  the  form  of  a  TETRAD  condition  (Bollen,  1989,  p.  304) 

cov{XY)  =  cov{yW)cov{WY)/cov{XV) 

Cai  and  Kuroki  (2008)  derive  additional  conditions  under  which  this  constraint  applies  to 
multivariate  sets  of  confounders  and  proxies. 

Equation  (14)  can  also  be  written 

cov[Y{X  -  Wcov{XW)]/[var{W)  -  var{ew)]  (15) 

which  provides  an  easy  test  of  (12),  in  the  style  of  Two  Stage  Least  Sqnare: 

1.  estimate  a  =  var{W)  —  var{ew)  (using  a  pilot  study  or  auxiliary  proxy  variables) 

2.  collect  samples  X*,  Id,  W  i  =  1,  2,  3, . . . ,  n 

3.  estimate  ci  =  cov{XW) 

4.  Translate  the  data  into  hctitious  samples  i  =  1,  2,3,  ...,n  with  V)  =  X*  — 

cov{XW)/aWi 

5.  Compute  (by  Least  Square)  the  best  ht  coefficient  a  in  X*  =  aVi  + 

6.  Test  if  a  =  0.  If  a  vanishes  with  sufficiently  high  conhdence,  then  the  data  is  compatible 
with  the  d-separation  condition  XYlY\Z. 

Theorem  1  can  be  generalized  to  include  missing  edges  between  latent  variables,  as  well 
as  between  latent  and  observed  variables.  In  fact,  if  the  graph  resulting  from  hlling  in  a 
missing  edge  permits  the  identification  of  the  corresponding  edge  coefficient  c,  then  the 
original  graph  imposes  a  statistical  constraint  on  the  covariance  matrix  that  can  be  used  to 
test  the  absence  of  that  edge.  Such  tests  should  serve  as  model-diagnostic  tools,  before  (or 
instead)  of  submitting  the  entire  model  to  a  global  test  of  htness. 


6  Conclusions 

The  paper  discusses  computational  and  representational  problems  connected  with  effect 
restoration  when  confounders  are  mismeasured  or  misclassihed.  In  particular,  we  have 
explicated  how  measurement  bias  can  be  removed  by  creating  synthetic  samples  from 
empirical  samples,  and  how  inverse-probability  weighting  can  be  modihed  to  account  for 
measurement  error.  Subsequently,  we  have  analyzed  measurement  bias  in  linear  systems 
and  explicated  graphical  conditions  under  which  such  bias  can  be  removed. 
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